Latency and Throughput
Learn about the basic characteristics of a network that help to meet the latency and throughput goals.
Introduction#
We often use high-level abstractions like sockets, message passing, remote procedure calls, or even more advanced constructs through API calls to simplify the complexities involved in network communications. On the other hand, applications have different needs in terms of user-perceived latency and throughput. For example, if we have a maximum budget of 500 ms for an API call, and the call needs to traverse cross-continental boundaries, the network consumes about 150 to 250 ms of the available 500 ms. This leaves about 250 ms for the actual processing on the server-side and any minute processing on the client-side.
Knowing what constitutes the latency component enhances our understanding and equips us to write better API service level agreements and some possible ways to reduce the latency. Similarly, multiple round-trips might be required to send data from the client to the server (and vice versa), which seems like a simple message or data at the application layer. Multiple rounds can add to the user-perceived latency. Additionally, fetching multiple objects using a certain version of a protocol might use independent TCP connections—incurring TCP connection making and cutting down costs. Name resolution using DNS can also incur substantial latency if the required name resolution is not already cached, either at the client-side or nearby.
We’ll first understand four important characteristics of the network—throughput, latency, jitter, and latency-bandwidth product. After that, we’ll classify applications primarily based on the delay so we can build a respective API that meets the needs of those applications, and so we know how much room we have as a designer.
Throughput#
Throughput refers to the process-to-process logical data rate that inevitably passes through many network hops. If we have a 1 Mbps effective throughput of some link or network, we can’t send more than 1 Mb of data in a second. We need to send the remaining data in the next second. In a specific unit of time, the amount of data that can be transmitted from the sender to the receiver is measured by the throughput.
Let’s assume that some IoT device can only receive and process data at a 1 Mbps rate, and the network can support a rate of up to 2 Mbps. In that case, IoT devices are not able to fully utilize what the network is providing (2 Mbps).
Latency#
Latency means how long (in terms of time) a user-level message takes to travel from the sender to the receiver. We’re often more concerned about the user-perceived delay, which is a function of the round-trip time (RTT).
Traditionally, throughput and latency have tradeoffs—for example, increasing the throughput or utilization might increase the latency as well. A detailed discussion on this topic is a subject of queuing theory, and is beyond the scope of this course.
Typically, network latency has the following constituent components:
Let's discuss them one by one.
Transmission delay#
The time it takes to place a packet on the transmission link is the transmission delay.
We can calculate this delay using the following formula:
The size of the packet and the transmission media's capacity affect the transmission delay. The transmission delay is inversely correlated with bandwidth and is directly proportional to the size of the packet. If the packet size is small and the bandwidth is large, the transmission delay decreases, and if the packet size is large and the bandwidth is small, the transmission delay increases. At times, sending the compressed data over the network helps to reduce data size.
Propagation delay#
Propagation delay is the time a bit takes to go from start-host to end-host. For example, in the case of cross-continental links, the propagation delay would be large because bits need to travel from one continent to another.
We can calculate the propagation delay using the following formula, where distance is directly proportional, and speed is inversely proportional to the propagation delay.
To see the implications of distance on propagation delay, take the example of NASA’s Mars rover. The distance from Earth to Mars is about 134 million km. Commands from Earth to a rover on Mars takes about 7.4 minutes to reach, assuming data propagates at the speed of light (the fastest anything can travel!). Therefore, NASA operators can’t maneuver the rover in real time from Earth. They rely on the autonomous capabilities of the rover to take a batch of instructions, act on it, and report back the results to Earth.
Note: Data movement speed across media depends on the specific characteristics of the media. For example, for a specific copper wire, information can flow at two-thirds the speed of light.
Queuing delay#
Queuing delay is the time needed for a node to hold the packet before it can proceed. The packet is kept in a queue until its turn comes up. At times, the internet is called the “network of queues” because a network packet needs to relay through many network routers to reach the final destination. A packet might need to wait in a queue at a router before it can proceed, just like a traffic intersection, where cars on one side need to wait while others are crossing the intersection.
Queuing delays depend on multiple factors, such as:
The time interval between the packets.
Transmission capacity of the outgoing link.
Current congestion conditions at the routers (networks get crowded as well).
Processing delay#
Processing delay is the time the node (routers) and end host (server) take to process the packet. It depends on the processing speed of the nodes and end host.
Question
Content caches, like content delivery networks (CDNs), target which component of latency?
Primarily, they target propagation delays. This is because the client fetches data from a CDN location that is nearby, reducing propagation delay. An extreme example of this is Netflix putting movies inside specific ISPs so that the ISP’s customers can get movie data at high bandwidth and very low latency.
Jitter #
Variation in latency is called jitter. It’s the measure of how latency changes from one packet to the next. Let's say that three packets are transmitted with an equal delay between them, but one of them gets delayed. This delay is measured by the jitter.
Jitters occur due to queueing delays, changes in the route, and so on. Jitter is usually measured in ms. Let's ping `educative.io` by entering the command below and observe the time difference for the five packets in the terminal:
Note: The
pingcommand is used to verify whether the response is received from the specified destination. The -c option is used to limit the number of ping requests.
For example, for one run of the `ping` command, the time sequence is 11.0, 10.3, 10.3, 10.3, and 10.2. We have to calculate the absolute difference between them as given below:
Packet | Time (t1) | Time (t2) | Delay differences |t1-t2| |
1 and 2 | 11.0 ms | 10.3 ms | 0.7 ms |
2 and 3 | 10.3 ms | 10.3 ms | 0 ms |
3 and 4 | 10.3 ms | 10.3 ms | 0 ms |
4 and 5 | 10.3 ms | 10.2 ms | 0.1 ms |
The jitter is calculated by taking the mean of the delay's differences:
We care about jitter because for some applications (for example a video playback client), larger jitter might mean buffering more data. Usually, a network with low jitter is considered better than the one with bigger jitter values over time.
Latency-bandwidth product and network utilization#
Sending one packet and then waiting for a response is wasteful. Back-to-back packet transmission will increase network utilization. We should know how many packets a link can hold, which can be determined using the latency-bandwidth product. This refers to the amount of data that could be in transit through the network at any time… The formula for the latency-bandwidth product is:
For example, if the effective bandwidth measured on the client-side is 1 Mbps, and the client-to-service user-perceived time is 200 ms, then how many bits will be in transit?
If the senders want to get the acknowledgment signal from the receiver, then the amount of bits the network can hold is
We discussed the network characteristics, and now it's time to discuss the classification of the application primarily based on the delay.
Application classes#
In a client-server model, an application can have distributed pieces of service and the client. If we have real-time delay needs, it might be impossible to achieve them without express cooperation between the client and the service.
Modern network applications are built using APIs, and by understanding the delay needs of the applications, we can design the right APIs. However, these applications are not uniformly developed in terms of their network requirements, for instance, some applications might have the following requirements:
Fetch a web page having text and images.
Stream recorded audio and video.
Stream live events (video and audio). These applications usually have an upper limit of 300 ms before end users start complaining about non-real time, and a 100 ms delay is considered really good.
Facilitate interactive bidirectional communication.
We divide the applications into different classes on the basis of delay, such as elastic and real-time applications. Elastic applications can tolerate the delay and real-time applications can’t tolerate substantial delay. Email is an example of an elastic application where the sender sends the email, and it’s delivered to its recipient with some delay (a few seconds). This is usually okay.
On the other hand, the space shuttle navigation system is an instance of a real-time application that gives a periodic signal about its location to the International Space Station (ISS). The malfunction of these essential systems must be avoided at all costs because they can have serious consciences that directly impact the lives of their users.
Let's take a look at the further classifications of applications.
Type of elastic applications#
Interactive: Any application that allows us to input data and get a response. For example, Telnet.
Interactive bulk: Any application that allows us to input data and get a large amount of responses. For example, File Transfer Protocol (FTP).
Asynchronous: Any application that does not wait for the response to give the next instruction. For example, email.
Type of real-time applications#
Hard/intolerant: This doesn’t allow any tolerance for delay, for example, live video conferencing.
Soft/tolerant: This allows a little tolerance for delay, for example, web browsing.
Adaptive: Some applications adjust their functionality according to the network conditions and are considered adaptive. For example, the playback point of a live streaming event. The playback point tells us how long the buffer keeps the packet before it’s played back. If packets arrive within 200 ms, then we can set the playback point to 200 ms. But if packets are suffering from some delay due to network conditions, then the application can adjust the playback point to the possible delay, and it’s referred to as “delay adaptive.” However, if the packet is received after its playback time, it will be discarded from the buffer.
Non-adaptive: Some applications don’t adjust their functionality according to the network conditions. For example, if the playback time is 300 ms and suddenly packets are arriving within 100 ms, the packets still wait for 300 ms before it’s played back.
There’s one more class of adaptive real-time application—rate adaptive. It depends on prevailing network conditions, measured as delay, jitter, or some quality of experience metric. The application changes the quality of the video according to network conditions. For example, on YouTube, we have different options for video quality according to the network bandwidth. If the bandwidth is large, then the quality of the video increases.
Summary#
Now we know how a typical network behaves and where to look when latency or throughput is not up to our liking or need. If an API call is taking too long, we need to find out its constituent latency components to see which component is primarily giving high latency. For example, if we find the main component is propagation delay (some clients might start using our services from a new country), we might need to shift that specific client to a different data center closer to them, resulting in less propagation delay.
Quiz#
By decreasing the distance between end-to-end devices, which delay decreases?
transmission delay
Propagation delay
The formula for propagation delay is:
The propagation delay is directly proportional to the distance. If we decrease the distance, the propagation delay decreases.
Processing delay
Queue delay
The Narrow Waist of the Internet
Network Sockets as a Foundation for Communication